Three major concerns have been raised about my draft proposal for disc
structures and allocation strategies for NewCore. These notes describe some
possible approaches, and I would welcome further comments on these.


The issues are:

  1)  How is a small object extended if there is insufficient space in its
      zone?

  2)  The proposed maximum disc size of 128G is woefully inadequate.

  3)  Behaviour when the disc is nearly full is unsatisfactory.



1  Extending small objects when their home zone is full


The problem:

A request to extend a small object by n sectors is made when the zone in
which the object exists has less than n sectors free space available.

Although it is possible to turn a small object into a large object, it is not
possible to move a small object from one zone to another, and so this request
cannot be met.


Proposed solution:

Relax the "immutable object-id" constraint, and allow an object-id to change
when its size changes; this is acceptable, because the object's directory
entry will normally be available at the time a size change is requested.

It's now possible to extend a small object by moving it to another zone, thus
solving the problem.


As a corollory, large objects no longer need to be indirected through an
indirection table entry; instead, the flag which indicates whether the object
is small or large is moved to the object-id which now has the form:

   flag      other field(s)              meaning

    0      zone-id    entry-id         small object
    1          chunk-id                large object



2  Handling larger discs


The concern:

128G is only 64 times larger than discs currently available for less than
2000-00; so discs of this size may become common-place for desktop computers
within the life-time of NewCore.


A possible way out:

Assuming we retain the 32k sectors/zone limitation, the maximum disc size is
limited by:

  - acceptable sector size
  - 15-bit chunk-id
  - 15-bit zone-id
  - acceptable sizes for zone table and chunk table

With a sector size of 512 bytes, 32k sectors/zone gives a zone size of 16M.
If we retain a 15-bit zone-id, then this gives a maximum disc size of 512G
(and if we accept a sector size of 1k, we can reach 1T).

If we retain 16 chunks/zone, this means 512k chunks on the disc, thus needing
a 19-bit chunk-id.

This means that chunk table entries must be at least 3 bytes long - probably
4 for convenience - thus making the chunk table 2M in size. The zone table
becomes 64k in size.

Of course, it may be that RAM sizes increase as discs do, and that 2M for the
chunk table for a 512G disc is quite reasonable; but if not, then the chunk
table will have to be paged.


Suggestion:

Make sure that NewCore is coded in such a way that, say, 23-bit chunk-ids
can be accommodated, and that it is straightforward to insert a cache for
the chunk table.

Accept that disc sizes of the order of 1T will be the ultimate maximum, and
that the initial implementation may cope only with discs up to 128G.



3  Near-full behaviour


The concern:

When a disc is almost full, the maximum size object that can be created
depends on the maximum amount of free space available in any one "shareable
section", rather than on the total free space available.

More precisely, if the free space available in a shareable section is less
than the chunk size, then it can only be used for small objects.

This may confuse the user - who might reasonably expect to be able to create
any file whose size is less than the free space available on the disc.


Possible approaches:

a) Careful presentation*

Suppose the free space available in a shareable section is

     free-space = n*chunk-size + x   (where x is less than the chunk-size)

Instead of presenting the sum of "free-space" values as the free space
available, we add together:

    - n*chunk-size over all shareable sections
    - the maximum value of x over all shareable sections

In this way, we guarantee that the user can always create a file whose size
is less than the free space value displayed.

On the other hand, the free space value will now exhibit unusual
characteristics as the disc becomes full: suppose all shareable sections have
less than chunk-size free space, then the free space value may remain at a
constant small value for a long time (a bit like most cars' petrol gauges!).

[*A more careful presentation of this approach has just arrived by email from
  John Redford]


b) Modified allocation strategy

Try to avoid shareable sections with significant amounts of free space less
than chunk-size by trying to fill such sections up as soon as they appear.

In other words, give priority to filling up any "almost-full" section over
storing a small object close to its parent.


c) Fragmented objects

Introduce a new kind of object - the "fragmented object" - which is
represented as follows:

  - A bit in the indirection table entry identifies the object as fragmented.

  - The rest of the entry identifies an "index sector" which contains a list
    of (object-id, length) pairs which describe the fragments of the object.

Possible allocation rules are:

  - Only allocate an object as fragmented if there is no other way to do it.

  - Try to reduce the fragmentation of a fragmented object whenever a request
    is made to change its size, or when it is closed.


d) Indirected objects

Introduce a new kind of object - the "indirected object" - and additional
allocation strategies which are invoked when it is not possible to allocate
an object in any other way.

An indirected object is represented as follows:

  - A bit in the indirection table entry identifies the object as indirected.

  - The rest of the entry identifies a sector which contains an object-id
    identifying where the object is stored.

Note that there may be a chain of such indirections - where an indirected
object points to an object which is itself an indirected object which points
to another object ...

The indirected object concept makes it possible to move small objects from
one zone to another without needing to update the corresponding directory
entries, thus making it feasible to at least partly combine free space from
separate shareable sections.

However, we know that determining the best way to do this is a difficult
problem, and so the algorithm used must be prepared to give up.


Comments:

I favour (a), with a touch of (b) if (a) on its own turns out to be
unsatisfactory.

I'd very much like to avoid the complexities of (c) or (d).



    Mike

